Skip to content

Conversation

@flakey5
Copy link
Member

@flakey5 flakey5 commented Mar 6, 2025

Switches over to using the new doc generation tooling. For more background on this, please see #52343

Currently a draft just to get feedback on the approach to this integration.

cc @nodejs/web-infra

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/nodejs-website
  • @nodejs/web-infra

@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. doc Issues and PRs related to the documentations. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. windows Issues and PRs related to the Windows platform. labels Mar 6, 2025
@flakey5 flakey5 marked this pull request as draft March 6, 2025 06:24
@flakey5 flakey5 force-pushed the flakey5/20250305/api-docs-tooling branch from 77ede22 to 3423c21 Compare March 6, 2025 06:29
@flakey5 flakey5 force-pushed the flakey5/20250305/api-docs-tooling branch from 3423c21 to 451f8a7 Compare March 6, 2025 06:31
ovflowd

This comment was marked as outdated.

@flakey5 flakey5 force-pushed the flakey5/20250305/api-docs-tooling branch 3 times, most recently from cf2609b to a3ce99d Compare March 10, 2025 22:04
@flakey5 flakey5 marked this pull request as ready for review March 10, 2025 22:05
@flakey5

This comment was marked as resolved.

@flakey5

This comment was marked as resolved.

@ovflowd

This comment was marked as resolved.

@codecov
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.78%. Comparing base (1ad04e2) to head (3b3a9e2).
⚠️ Report is 33 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #57343      +/-   ##
==========================================
- Coverage   89.80%   89.78%   -0.03%     
==========================================
  Files         672      672              
  Lines      203907   203755     -152     
  Branches    39203    39167      -36     
==========================================
- Hits       183121   182940     -181     
- Misses      13113    13128      +15     
- Partials     7673     7687      +14     

see 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@araujogui

This comment was marked as resolved.

@araujogui

This comment was marked as resolved.

@ovflowd

This comment was marked as resolved.

Copy link
Member

@ovflowd ovflowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result of many months of arduous work between many awesome folks, including @flakey5 @AugustinMauroy @araujogui @ovflowd @avivkeller and others.

I'm so proud of what we are achieving here and this is a huge step towards a modern tooling and a revamped API docs within Node.js

Approving, as I believe this is ready!

@ovflowd

This comment was marked as resolved.

Copy link
Member

@lpinca lpinca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSLGTM because it is hard to review and outside of my comfort zone.

Copy link
Contributor

@aduh95 aduh95 Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any test change for the JSON output, such changes should be introduced in separate PRs. #57343 (comment) still stands

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this. We include test changes in PRS that change things all the time. There's no reason this PR should be singled out requiring the tests to be moved to a different pr

Copy link
Contributor

@aduh95 aduh95 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase: my problem is not that a PR touches a test, my problem is that this PR touches a doctool test1. I'm -1 on all the changes that break this test, which was written by me2 with the explicit goal of giving actionable way to address #57343 (comment). Sending them in a separate PRs is offered as way to unblock this PR, but I would probably still oppose those changes if presented to me in separate PRs (e.g. what justification could there be that the number of deprecated sections doesn't match? I'm open to being convinced, but there has been no attempt to do so, the test was simply updated without any explanation – and at this point, I'm not asking to be convinced, I'm asking for the new tool to align with the current one, so we can move forward and discuss changes later).

Footnotes

  1. and that's not news, I've had already expressed that back in March last year in #57343 (comment)

  2. test-doc-api-json.mjs was added in #61413

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to being convinced, but there has been no attempt to do so, the test was simply updated without any explanation

That unfortunately came from @avivkeller assumption that all we needed was to update the tests. It didn't come with ill intention, and such commits can be reverted in this PR.

Also, you never asked to be convinced or asked on Slack why the diffs presented in the tests are fine, you blocked the PR without even inviting a discussion. Just stated I'm strongly against this change.

There's also a misconception being told here, the structure/spec of the current JSON and the one from the new tooling is a 1:1. Now that there are some extra fields or values in arrays, that's due as we explaining the current tooling has bugs on its parsing of the JSON and doc-kit addresses those AST and Regex bugs.

The only team Im aware that consumes this is the DefinitelyTyped team, we can just ask them if the JSON generated by the new tooling causes any issues to them.

Also, at the very least the new doc-kit is a semver minor because it will only start affecting any new release of Node.js

Also, if we are concerned about JSON not being fully 1:1 character by character, why aren't we concerned with the HTML then? What guarantees we have that the diff in the test breaks anything?

Now, I believe @flakey5 and @avivkeller are the best to explain what these test failure diffs are and why they are harmless.

Copy link
Member

@panva panva Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only team Im aware that consumes this is the DefinitelyTyped team

That would primarily be @Renegade334, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only team Im aware that consumes this is the DefinitelyTyped team

That would primarily be @Renegade334, right?

I don't recall, but my team had many meetings with the folks maintaining @types/node to the point that it became clear they don't even use right now the JSON file to automate @types/node, most of the work is manual and they only use certain parts of the generated JSON file at the moment.

doc-kit is supposed to introduce in the future a new JSON format that can help teams to almost fully automatically generate the types/node package, but that's offtopic for now 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we are concerned about JSON not being fully 1:1 character by character

You mischaracterising my concerns, the test I've written doesn't check for the structure to be 1:1, it only ensures consistency on a very limited set of criteria for the JSON output. You're the only one who claimed 1:1 match, and AFAIK no one is asking for it.

This comment was marked as off-topic.

@avivkeller avivkeller added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jan 27, 2026
@mcollina
Copy link
Member

We discussed this during today's TSC meeting and the general feeling is that the JSON output should stay the same for ease of rollout. Changing it would probably be a semver-major change, which we should avoid for this.

@ovflowd
Copy link
Member

ovflowd commented Jan 28, 2026

We discussed this during today's TSC meeting and the general feeling is that the JSON output should stay the same for ease of rollout. Changing it would probably be a semver-major change, which we should avoid for this.

I chatted with @jasnell today, and he called for another TSC meeting to be done in the following days. There are points that got presented to him that make him believe we shouldn't do this.

@ovflowd
Copy link
Member

ovflowd commented Jan 28, 2026

I also find it personally oblivious that the TSC meeting happens without any representation from the Web Infra side, which makes this a heavily one-sided argument.

@aduh95
Copy link
Contributor

aduh95 commented Jan 28, 2026

@ovflowd do you want to join? We're still in the meeting, in case you're available right now

@ovflowd
Copy link
Member

ovflowd commented Jan 28, 2026

@ovflowd do you want to join? We're still in the meeting, in case you're available right now

I unfortunately gotta go to gym now 😅 -- I don't want to make a big fuss over this, just want to properly be able to have my concerns heard. I was able to explain my points to James, IRL (he's in Germany right now) and he completely understood them. So I feel quorom is needed. In the end of the day, if the TSC leans over "agreement with Antoine", I completely understand that and respect the decision 🙇

@avivkeller
Copy link
Member

I've added this to the Web Team agenda as well, so that we may discuss our stances on the matter more collectively.

@jasnell
Copy link
Member

jasnell commented Jan 29, 2026

I also had to miss the call as I'm in Munich for a company meeting. I also object to the decision and believe it needs more consideration and discussion with members of the web infra team present to offer counter arguments and discussion. At the very least @ovflowd and @flakey5 should have been explicitly invited to participate in this part of the discussion. I'm going to ask that we revisit this in the next TSC call that is in a timezone we can attend. /CC @nodejs/TSC

@mcollina
Copy link
Member

I also find it personally oblivious that the TSC meeting happens without any representation from the Web Infra side, which makes this a heavily one-sided argument.

To clarify: there was no question asked to the TSC in this issue, and there was a bit of context provided by @aduh95. There is still no question.

I was able to explain my points to James, IRL (he's in Germany right now) and he completely understood them.

I’m glad you did! But still leaves most of the @nodejs/tsc in the dark.

As this was presented:

  1. there is a breaking change in the output JSON format
  2. users are relying on that format for automation

The most natural thing to ask is if it could be made non-breaking. Land as semver-major if not. Given this id an undocumented feature of Node.js, you can ask the TSC to decide if this is covered by the semversiness guarantees.

Can you please formalize that argument?

@panva
Copy link
Member

panva commented Jan 29, 2026

I am not under the impression that these are breaking changes since this deals with documentation, not runtime. We change documentation all the time - we reorder html elements, add paragraphs, remove nodes, add change entries, add entire new pages, you name it. Why would the api.json need to be treated differently, just because it may be consumed programmatically?

As @mcollina said, there was no ask to the TSC presented here, a label was added without one, there's a blocking review from @aduh95 that the authors have yet to engage with. The actual changes are not clear to me but if the test/doctool/test-doc-api-json.mjs diff is any indication of those changes they are not out of line with how I expect documentation to be treated.

@aduh95
Copy link
Contributor

aduh95 commented Jan 29, 2026

Why would the api.json need to be treated differently, just because it may be consumed programmatically?

Yes exactly, and also such change should be introduced separately, so collaborators can review whether it is indeed an improvement, and whether it has potential to break the ecosystem. I would add that the current state makes it needlessly hard to review because the generated JSON is not human readable. In #57343 (comment), @ovflowd claimed the JSON structure would be a 1:1 match, so I don't think it's an unrealistic expectation to start with not breaking the existing tests (albeit, test-doc-api-json.mjs was added long after this PR was opened, with the explicit goal of detecting potential breaking changes).

FWIW I really think that we should make this PR backportable, otherwise we would make our lifes harder if we have to maintain two very different toolings in LTS branches vs main.

@ovflowd
Copy link
Member

ovflowd commented Jan 29, 2026

I’m glad you did! But still leaves most of the @nodejs/tsc in the dark.

Yeah, my bad, that's on us. I just assumed that we would have the call together and explain all things there, this was clearly miscommunication from our side, for that, Im sorry!

@ovflowd
Copy link
Member

ovflowd commented Jan 29, 2026

FWIW I really think that we should make this PR backportable, otherwise we would make our lifes harder if we have to maintain two very different toolings in LTS branches vs main.

I think there is merit for you to explain what exactly is not backported right now or breaking right now or actually provide information on why these test failures are actually breaking changes on a test suite that was introduced a few days ago for algo that was never covered in tests. Sure, we agreed that adding tests on nodejs/node was a pre-requisite to land this PR, but I never imagined it would also be used as a forceful baseline, those tests are not absolute and their failures don't tell the whole story 😅

@ovflowd
Copy link
Member

ovflowd commented Jan 29, 2026

To be fair, not trying to argue Antoine for the sake of it, but I believe we need to properly present our points (both sides, and back our claims), and how we would address these issues. Because what is being asked from Antoine is that we change the existing legacy tooling so that its JSON output becomes 1:1 of doc-kit... And then we can replace it to doc-kit and that just doesn't make sense to me.

If the concern is the fact that the output is not 1:1 character by character would be an issue then regardless of it being doc-kit or not.

Plus the amount of effort to update the legacy tooling which has very legacy code, has the risks of breaking even more things that are probably not covered. It is just a high risk, low reward and high effort ask... just to then be immediately replaced by doc-kit.

@aduh95
Copy link
Contributor

aduh95 commented Jan 29, 2026

@ovflowd if you want to make a list of the breaking changes introduced in this PR and the justification for each, reviewers would have the needed context for assessing if it's OK to land and to backport. IMO it would be more productive to try to remove those breaking changes first (you claim it's hard to do, that doesn't reflect well on the flexibility of doc-kit, but maybe I'm missing some context), as there would be no need to assess the potential breakage.

}

assert.strictEqual(numberOfDeprecatedSections, 39); // Increase this number every time a new API is deprecated.
assert.strictEqual(numberOfDeprecatedSections, 44); // Increase this number every time a new API is deprecated.
Copy link
Member

@joyeecheung joyeecheung Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell @aduh95 is not saying that the output must be character by character same as the previous one. But that changes like this line isn’t supposed to happen in this PR and needs explanation - either there is a bug in the test parsing for depreciations, then this can be explained (and ideally these explanations should be summarised) in the PR which also serves archeological purposes as to why a doc tool change increase the number of depreciations by 5. Also for better backporting the test fixes should
be done in a separate PR preceding this one instead. Or there is a bug in doc-kit and it should be fixed before landing. Either way, it should be looked into and explained instead of being ignored completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessarily the test suite being broken, it is more of the legacy tooling being broken and generating the list of deprecations incorrectly for the JSON file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list the deprecations before and after the doc tool change? If the older list shows that the current doc tools are broken, I think it can be justified to land the test change together with the bug fix. Although I think the debate here isn't necessarily that the old list must be maintained, but that there is a lack of explanation about exactly what changed beyond a high level "the old tool was broken". That can be a reasonable argument, but it requires more details regarding "exactly how it was broken, and what were the 5 depreciations that were missing (or the diff if it's not strictly just adding more)"

Copy link
Member

@ovflowd ovflowd Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are absolutely right. And to be fair, I think we should have done a better job on listing what changed and why on this PR description. It is indeed a black box for the remaining of the reviewers.

cc @avivkeller / @flakey5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explained these changes, including the deprecations, on the main thread

@avivkeller
Copy link
Member

Hey everyone! I spent the time a few weeks ago and went through every git diff change of the documentation JSON, and every change that I noted was expected (on my end).

While I understand that isn't much reassurance, I'll gladly spend more time documenting each breaking change to give y'all the "needed context" of why the tests needed to be changed.

There is still no question.

That's my fault. My understanding was that if reviewers and PR authors could not reach an agreement, TSC escalation was required, but I should've formally explained the concerns.

The argument (on behalf of the web-infra side of this PR) for the TSC is: To what extent does this change need to be 1:1 with the old documentation? Currently, the structure is mostly the same, but several bugs in the old documentation RegExes and parses were squashed. Thus, the JSON is a more accurate representation of the markdown, but breaks the tests for being slightly different.

you claim it's hard to do, that doesn't reflect well on the flexibility of doc-kit, but maybe I'm missing some context

The reason why it's hard for us to make this exactly 1:1, is because it requires reverting those bug fixes, which would then break all the other generators we've spent many months working on. (i.e. making the JSON 1:1 will make the HTML not 1:1, since it relies on those bugs being non existent)

If this PR must be semver-major, I'm perfectly fine with that. Yes, it would require maintaining the legacy tooling for a few more months, but if it would resolve all the concerns, I think it's a step in the right direction.

list of the breaking changes

I'm not in front of an API git diff at the moment, so this list is definitely not exhaustive, but off the top of my head, breaking changes (which are mostly bug fixes) include:

  • More accurate sub-header splitting (hence, the part of the test that checks synopsis.md changed)
  • optional status identification (hence, the part of the test that checks keys needed to add optional)
  • the new api top-level property (hence, the part of the test that checks keys needed to add api)
  • more accurate identification of stability alert boxes (hence, the part of the test that identifies deprecated changes needed to increase the expected count)
  • JSON is minified (this is easy to revert; hence, the newline check was removed)

@ovflowd

This comment has been minimized.

@ovflowd

This comment has been minimized.

@jasnell
Copy link
Member

jasnell commented Jan 29, 2026

Let's take a break on the conversation here and take a breath. We'll surface the conversation again on the TSC agenda and we'll make sure that the web infra team is represented in that conversation this time.

@jasnell
Copy link
Member

jasnell commented Jan 29, 2026

Can you please formalize that argument?

Let's do this in the TSC call. I think it's best for this PR discussion to take a cool off period. Folks are getting too frustrated.

@joyeecheung
Copy link
Member

joyeecheung commented Jan 29, 2026

My 2 cents: it seems @ovflowd and @aduh95 agree that #57343 (comment) is a path forward. In that case I'd say the work can continue on GitHub. The TSC meeting is not set up for making decisions, it's mostly a venue for FYI to TSC members because it's otherwise hard to get everyone's attention or test water temperatures quickly. (Case in point - I was not in the last meeting and I generally do not prioritize the TSC meeting in my schedule. I either join the meeting for that FYI when I have some free time, or check the agenda for FYI which serves equally for what I think the TSC meeting serves. I think I am not the only one that treats the TSC meeting this way given the low turnout of the TSC meetings in general).

If we can agree on "we don't necessarily need the JSON output to be the same, just that changes to behaviors verified by tests should be minimized and if not, some detailed explanation about what's being changed here is warranted. The doc-kit contributors are looking into either minimizing or explaining the test changes.", then this one paragraph summary may be enough for the next TSC meeting.

@avivkeller
Copy link
Member

avivkeller commented Jan 29, 2026

const allExpectedKeys = new Set([
   'added',
+  'api',
   'changes',
   'classes',
   'classMethods',
...
   'miscs',
   'modules',
   'name',
+  'optional',
   'napiVersion',
   'options',
   'params',

The new tooling adds an api top-level property, identical to the name of the file, i.e. fs or async_hooks. This property, while added, does not change the otherwise older structure of the JSON.

The new tooling also adds an optional property, denoting whether a property is optional (i.e. many options properties), such as filehandle.createWriteStream([options]).


-  assert.strictEqual(fileContent.split('\n', 3).length, 3);

The new tooling minifies JSON.


   if (dirent.name !== 'index.md') {
     assert.ok(json.introduced_in || Object.values(json).at(-1)?.[0].introduced_in);
+    assert.partialDeepStrictEqual(allExpectedKeys, findAllKeys(json));
   }
-  assert.partialDeepStrictEqual(allExpectedKeys, findAllKeys(json));

The new tooling generates a complete JSON structure for index.md (i.e. https://nodejs-api-docs-tooling.vercel.app/index.json). The old tooling generates an almost empty JSON. While this new structure is valid, it does not satisfy the requirements in findAllKeys, as this markdown file isn't really a module API docs (and as such, has blank properties [i.e. textRaw] which would be filled in a typical document).


-  assert.deepStrictEqual(Object.keys(json), ['type', 'source', ...({
+
+  assert.deepStrictEqual(Object.keys(json).sort(), ['type', 'api', 'source', ...({
     'addons.md': ['introduced_in', 'miscs'],
     'cli.md': ['introduced_in', 'miscs'],
     'debugger.md': ['introduced_in', 'stability', 'stabilityText', 'miscs'],
...
     'errors.md': ['introduced_in', 'classes', 'miscs'],
     'esm.md': ['introduced_in', 'meta', 'stability', 'stabilityText', 'properties', 'miscs'],
     'globals.md': ['introduced_in', 'stability', 'stabilityText', 'classes', 'methods', 'miscs'],
-    'index.md': [],
     'intl.md': ['introduced_in', 'miscs'],
     'n-api.md': ['introduced_in', 'stability', 'stabilityText', 'miscs'],
     'packages.md': ['introduced_in', 'meta', 'miscs'],
     'process.md': ['globals'],
+    'synopsis.md': ['introduced_in', 'miscs'],
     'report.md': ['introduced_in', 'stability', 'stabilityText', 'meta', 'miscs'],
-  }[dirent.name] ?? ['modules'])]);
+  }[dirent.name] ?? ['modules'])].sort());
 }

See aforementioned comment on why index.md has the modules section.

For synopsis.md, the new tooling places "Example" in a separate entry than "Usage and Example", and thus, has additional properties.


-assert.strictEqual(numberOfDeprecatedSections, 39); // Increase this number every time a new API is deprecated.
+assert.strictEqual(numberOfDeprecatedSections, 44); // Increase this number every time a new API is deprecated.

Finally, the number of deprecated sections. The new tooling correctly identifies many methods that were previously ignored by the legacy tooling, including deprecated ones (shown below):

{
  "name": "Buffer",
  "textRaw": "`new Buffer(array)`"
}
{
  "name": "Buffer",
  "textRaw": "`new Buffer(arrayBuffer[, byteOffset[, length]])`"
}
{
  "name": "Buffer",
  "textRaw": "`new Buffer(buffer)`"
}
{
  "name": "Buffer",
  "textRaw": "`new Buffer(size)`"
}
{
  "name": "Buffer",
  "textRaw": "`new Buffer(string[, encoding])`"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Issues and PRs related to build files or the CI. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. doc Issues and PRs related to the documentations. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. tsc-agenda Issues and PRs to discuss during the meetings of the TSC. web-agenda

Projects

None yet

Development

Successfully merging this pull request may close these issues.